# **Importing Dependencies**

In [0]:
import pickle
import os

import imageio
import tqdm

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

tf.compat.v1.disable_eager_execution() 
tfK = tf.keras

# **Allowing for Parallelized Model Training**

By default, TensorFlow allocates all available GPU memory to the current training process. By enabling memory growth, however, we can train multiple models in parallel.

In [0]:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        print(e)

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

# **Loading the Feature Extractor**

In [0]:
model_path = "./models/trained_cnn_2.h5"
model = tfK.models.load_model(model_path)

We use the trained CNN as a **feature extractor**. To do this, we simply "chop off" the dense and dropout layers following the CNN's last convolutional block, resulting in 8192 features being extracted per image fed to the CNN:

In [0]:
intermediate_layer_model = tfK.models.Model(inputs=model.input,
                                            outputs=model.get_layer("flatten_4").output)

# **Loading Data**

In [0]:
with open("ordered_slices_by_patient_randsubset.pkl", "rb") as f:
    patients_pkl = pickle.load(f)

label_df = pd.read_csv("labels_cleaned.csv")
label_df["ID_nopng"] = label_df["ID"].str.replace(".png", "")
ID_list = label_df["ID_nopng"].tolist()

# **Preparing the Data for Feature Extraction**

For some files present in the data, the *actual image data* (the PNG) is missing. Here, we remove these files:

In [0]:
patients_pkl_clean = dict()

for key, item in patients_pkl.items():
    tmp = []
    for slice_id in item:
        if os.path.isfile("./Windowed-PNGs-FINAL-comb/" + slice_id + ".png"):
            tmp.append(slice_id)

    patients_pkl_clean[key] = tmp

Next, we determine how many brain slices each patient's CT scan contains (and what the smallest number of slices in any CT scan is):

In [0]:
min = float("inf")
lens = []

for key, item in patients_pkl.items():
    if len(item) < min:
        min = len(item)
    lens.append(len(item))

We find that some CT scans do not contain enough slices to lend themselves well to our **sequential approach**. We ensure that only patients with a sufficient number of slices are considered:

In [0]:
n_slices = 24

patients_long_enough = dict()
for key, item in patients_pkl_clean.items():
    if len(item) >= n_slices:
        mid_slice = len(item)//2
        truncated_slice_IDs = item.copy()[mid_slice - n_slices//2:mid_slice + n_slices//2]
        patients_long_enough[key] = truncated_slice_IDs

Finally, we verify that we still have enough patients left to adequately train our sequential-convolutional model (indeed, 2418 patients remain):

In [0]:
n_patients = len(patients_long_enough)
n_features = 8192

len(patients_long_enough)

2418

# **Performing the Feature Extraction**

We extract features for the training of our **bidirectional LSTM** by feeding all training PNGs to our previously-trained CNN, letting it run its inference, and then---for each PNG---grabbing the 8192 values from the last convolutional block:

In [0]:
# This list will contain the extracted features for all training PNGs
data_list = []
# List of corresponding labels for the extracted features
label_list = []

for i, (patient_ID, slice_IDs) in enumerate(tqdm.tqdm(patients_long_enough.items())):
    data_patient_list = []
    label_patient_list = []
    for j, slice_ID in enumerate(slice_IDs):
        # Load respective PNG
        png_array = np.expand_dims(imageio.imread("./Windowed-PNGs-FINAL-comb/" + slice_ID + ".png"), 0)
        # Extract features
        layer_features = intermediate_layer_model.predict(png_array).flatten()
        
        data_patient_list.append(layer_features)
        
        label_patient_list.append(label_df[label_df["ID_nopng"]==slice_ID]["any"].iloc[0])

    data_list.append(data_patient_list)
    label_list.append(label_patient_list)
    
data_array = np.array(data_list)
label_array = np.array(label_list)

Writing the extracted features and corresponding labels to files:

In [0]:
np.save("rcnn-data-array", data_array)
np.save("rcnn-label-array", label_array)